AITopics | asynchronous q-learning

Reviewer # 3 & # 4 We thank the reviewers for the positive feedback and helpful comments

Neural Information Processing SystemsNov-16-2025, 07:37:05 GMT

But this is computationally impractical as there are so many users visiting at every second.

artificial intelligence, machine learning, reviewer, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Further related works

Neural Information Processing SystemsNov-15-2025, 03:56:46 GMT

We now take a moment to discuss a small sample of other related works. Tsitsiklis, 1994; Jaakkola et al., 1994; Szepesvári, 1997), which enjoys a space complexity Finite-time guarantees of other variants of Q-learning have also been developed; partial examples include speedy Q-learning (Azar et al., 2011), double A common theme is to augment the original model-free update rule (e.g., the Q-learning update rule) by an exploration bonus, which typically takes the form of, say, certain upper confidence bounds (UCBs) motivated by the bandit literature (Lai and Robbins, 1985; Auer and Ortner, 2010). Model-based RL is known to be minimax-optimal in the presence of a simulator (Azar et al., 2013; Agarwal et al., 2020; Li et al., 2020a), beating the state-of-the-art model-free algorithms by achieving optimality for the entire sample size range (Li et al., 2020a). When it comes to online episodic RL, Azar et al. (2017) was the first work that managed to achieve The way to construct hard MDPs in Jaksch et al. (2010) has since been adapted by Jin et al. (2018) to exhibit a lower bound on episodic MDPs (with a sketched proof provided therein).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsNov-14-2025, 00:17:27 GMT

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsOct-2-2025, 21:38:41 GMT

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

473803f0f2ebd77d83ee60daaa61f381-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 15:57:31 GMT

artificial intelligence, asynchronous q-learning, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Further related works

Neural Information Processing SystemsAug-16-2025, 02:29:46 GMT

We now take a moment to discuss a small sample of other related works. Tsitsiklis, 1994; Jaakkola et al., 1994; Szepesvári, 1997), which enjoys a space complexity Finite-time guarantees of other variants of Q-learning have also been developed; partial examples include speedy Q-learning (Azar et al., 2011), double A common theme is to augment the original model-free update rule (e.g., the Q-learning update rule) by an exploration bonus, which typically takes the form of, say, certain upper confidence bounds (UCBs) motivated by the bandit literature (Lai and Robbins, 1985; Auer and Ortner, 2010). Model-based RL is known to be minimax-optimal in the presence of a simulator (Azar et al., 2013; Agarwal et al., 2020; Li et al., 2020a), beating the state-of-the-art model-free algorithms by achieving optimality for the entire sample size range (Li et al., 2020a). When it comes to online episodic RL, Azar et al. (2017) was the first work that managed to achieve The way to construct hard MDPs in Jaksch et al. (2010) has since been adapted by Jin et al. (2018) to exhibit a lower bound on episodic MDPs (with a sketched proof provided therein).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsJan-24-2025, 07:03:14 GMT

Weaknesses: My main concern about the paper is whether this proposed algorithm is actually implementable due to the specific expression of the (constant) learning rate. I have two concerns: 1. The learning rate depends on t_{mix} in Theorem 1 and on the universal constants c_1 in both Theorem 1 and Theorem 2. How can we compute/approximate t_{mix} in advance? If we cannot, is it sufficient to employ a lower-bound on t_{mix}? Looking at the proofs c_1 is a function of constant c (Equation 55) that in turn derives from Bernstein's inequality (Equation 81) and subsequently \tilde{c} (Equation 84), but its value is never explicitly computed. I am aware that also in [33] the learning rate schedule (that is not constant) depends on \mu_{min} and t_{mix}, but I think the authors should elaborate more on this and explain how to deal with it in practice, if possible.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsJan-24-2025, 07:03:07 GMT

The reviewers appreciated the efforts made by the authors in the rebuttal, and updated their reviews accordingly. The paper contributions are now clear and important (an improved sample complexity analysis of asynchronous Q-learning, and a novel variance reduction algorithm and its analysis). We recommend the paper for acceptance and encourage the authors to account for the reviewers' comments when preparing the camera-ready version of the paper.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsOct-10-2024, 04:38:44 GMT

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a \gamma -discounted MDP with state space S and action space A, we demonstrate that the \ell_{\infty} -based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise \epsilon -accurate estimate of the Q-function --- is at most on the order of \frac{1}{ \mu_{\min}(1-\gamma) 5 \epsilon 2 } \frac{ t_{\mathsf{mix}} }{ \mu_{\min}(1-\gamma) } up to some logarithmic factor, provided that a proper constant learning rate is adopted. The first term of this bound matches the complexity in the case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the expense taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result by a factor of at least S A .

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unified ODE Analysis of Smooth Q-Learning Algorithms

Lee, Donghwan

arXiv.org Artificial IntelligenceApr-24-2024

Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2404.14442

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Collaborating Authors

asynchronous q-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reviewer # 3 & # 4 We thank the reviewers for the positive feedback and helpful comments

A Further related works

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

473803f0f2ebd77d83ee60daaa61f381-AuthorFeedback.pdf

A Further related works

Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Unified ODE Analysis of Smooth Q-Learning Algorithms